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Abstract — Discontinuity preserving smoothing is a fundamentally important procedure that is useful in a wide variety of image processing 
contexts. It is directly useful for noise reduction, and frequently used as an intermediate step in higher level algorithms. For example, it can 
be particularly useful in edge detection and segmentation. Three well known algorithms for discontinuity preserving smoothing are nonlinear 
anisotropic diffusion, bilateral filtering, and mean shift filtering. Although slight differences make them each better suited to different tasks, 
all are designed to preserve discontinuities while smoothing. However, none of them satisfy this goal perfectly: they each have exception 
cases in which smoothing may occur across hard edges. The principal contribution of this paper is the identification of a property we call 
edge awareness that should be satisfied by any discontinuity preserving smoothing algorithm. This constraint can be incorporated into 
existing algorithms to improve quality, and usually has negligible changes in runtime performance and/or complexity. We present modifications 
necessary to augment diffusion and mean shift, as well as a new formulation of the bilateral filter that unifies the spatial and range spaces to 
achieve edge awareness. 

Index Terms — Smoothing, image processing, computer vision, discontinuity preserving, mean shift, bilateral, diffusion. 
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1 Introduction 

DISCONTINUITY preserving algorithms attempt to 
smooth an image while preserving crisp edges. They 
are useful for many tasks, such as noise reduction, edge 
detection, and segmentation. Some very robust results have 
been obtained by treating the smoothing process as a global 
optimization problem, where the objective is to recover a 
smooth image that has high fidelity to the original cor- 
rupted image. Typically, the energy function to be mini- 
mized is a summation over all pixels in the image with two 
terms, one of which is minimized by a perfectly smooth 
image, and the other by perfect fidelity to the original 
corrupted image. 

It has been shown that the solution that minimizes such 
a function is also the maximal a posteriori solution to the 
problem, as defined by the prior and data terms |9| |7| |13|. 
Although this is an NP-hard |10| non-convex optimization 
problem IllJ, many novel algorithms have been proposed 
which can efficiently find good global minima in poly- 
nomial time, including Mean Field Annealing (MFA) |9| 
(22) (2TJ, Graduated Non-Convexity (GN C) p2|, B elief 
Propagation (BP) (13}, and Graph Cuts (GC) |14| (15) (16). 
Earlier works have also identified specific line and outlier 
processes, further discussed in (11). 

However, the global approaches have some significant 
disadvantages: they are often difficult to generalize to 
vector-valued images, they are generally very inefficient, 
they often have trouble maintaining perfectly smooth gra- 
dients, and fine structure may be lost. 

In this paper, a few algorithms are discussed which have 
emerged as some popular efficient alternatives to global 
optimization. The algorithms we consider are variable con- 
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ductance diffusion (17) (18) (19) £( (20), bilateral filtering 
(8) (20), and mean shift filtering (3). 

An example of filtering using each of these methods on 
the noisy Apollo image is shown in Fig. [T] As can be seen 
from this test image, each method can be effectively used to 
reduce noise while preserving sharp edges. Although these 
algorithms perform the same fundamental task, there is no 
clear winner between them - they all perform well, and 
individual differences between them (discussed in more 
detail in the following sections) are significant enough that 
each has found a niche where it may be superior to the 
others. 

The mean shift filter produces more crisp lines in homo- 
geneous regions because it was originally designed for a 
somewhat ulterior purpose - it is an adaptation of mean 
shift clustering for fine-scale segmentation of color images. 
Thus, smooth gradients are avoided, because pixel colors 
represent segments, and it is only a coincidence that this 
also appears to be an effective mechanism for discontinuity 
preserving smoothing. 

The bilateral filter and diffusion methods appear to pro- 
duce results of similar quality, but they are not equal; 
the bilateral filter can produce a greater contrast between 
sharp edges and very smooth regions, and is more efficient 
on massively parallel architectures. Conversely, diffusion 
is more efficient on single processor machines, and has 
a more physically meaningful result because it closely 
approximates the physical diffusion of densities through a 
medium. 

We make no attempt to compare or rank the aforemen- 
tioned algorithms against each other. However, we have 
noticed that that there are certain limitations to each algo- 
rithm which can result in unwanted effects, and examples 
of these issues are demonstrated on a specially designed 
Challenge test image, shown in Fig. [2] 

There are several notable features of this test image. In 
the upper region, there is a gradient going from the white 




(a) Original 



(b) Diffusion 




(c) Bilateral 



(d) Mean shift 



Fig. 1 . Original (a) and filtered result of Apollo image using: (b) 5 iterations of diffusion with A = 0.03; (c) regular bilateral filter with a s = 3, a R = 45; (d) 
regular mean shift filter with h s = 5 and h r = 45. 



background to a solid red color, adjacent to an orange 
rectangle, which is separated by a thin black line. Because 
the red and orange regions are clearly separated, we claim 
that smoothing should not occur across these regions. 

In the bottom half of the image there are four black 
rectangles each containing noise with a different domi- 
nant color: red, green, yellow, or purple. We claim that a 
smoothing algorithm should reduce the noise within these 
rectangles leaving each one filled with a solid color, and 
also preserve the crisp edges of the rectangles. 

Surrounding the rectangles is more brightly colored 
noise, but the noise is not uniform - the area is divided 
into blobs with distinctly dominant colors. We claim that 
a smoothing algorithm should reduce this to a smoothly 
varying colored surface. 

In the right region, there are 6 distinctly colored rectan- 
gles containing low levels of zero-mean noise. This presents 
a particularly simple task for a discontinuity preserving 
smoothing algorithm, and is shown here as somewhat of 
a control. 
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(a) Original (b) Diffusion (c) Bilateral (d) Mean shift 

Fig. 2. Synthetic Challenge (a) test image exhibiting some properties 
that cause existing methods to perform poorly: (b) variable conductance 
diffusion blurs hard edges (intentionally exaggerated here for illustration 
purposes); (c) the bilateral filter loses saturation and fails to smooth in high- 
noise regions; (d) mean shift mangles some of the edges. 

From Fig.[2j}, we see that successive iterations of diffusion 



erodes even the hardest edges, due to the Gaussian blurring 
of derivatives (we have used a blurring radius that makes 
this effect noticeable, although it could be tweaked to 
produce a nicer image). From Fig. |2j:, we see that the 
bilateral filter reduces saturation, produces an unwanted 
new color by blending the red and orange sections, and 
has trouble smoothing the highly noisy colored areas. From 
Fig. |2ji, we see that the mean shift filter mangles the black 
border edge features. 

In this paper, we investigate the cause of these unwanted 
effects, and show that most of them can be related to 
the violation of a certain principle which we refer to as 
edge awareness. We propose modifications to correct each 
algorithm with little or no changes in time complexity, 
and demonstrate that these modifications can significantly 
improve visual quality of smoothing. Specifically, we will 
show that 

1) Edge awareness is a concept that can be used to reduce 
estimation bias in image restoration. 

2) Diffusion is naturally edge aware, but becomes pro- 
gressively less so with each iteration. This can be fixed 
to preserve crisp edges after successive iterations, 
at minor additional expense and no change in time 
complexity (as shown in Section [6j. 

3) Edge awareness can be incorporated into the concept 
of the bilateral filter in a very intuitive way to pro- 
duce a novel algorithm that unifies the spatial and 
range spaces. Our modification multiples the time 
complexity as compared to a standard bilateral filter 
by 0(log d), for a neighborhood of size d (as shown in 
Section [7). 

4) Edge awareness can be partially incorporated into the 
mean shift filter with minimal change in running time 
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by preventing mean shift paths from crossing strong 
intensity boundaries (as shown in Section [8|. This 
prevents unwanted artifacts from appearing. 

2 Edge Awareness 

In order to be called discontinuity or edge preserving, a 
smoothing algorithm needs only to smooth areas without 
edges more than the areas with edges. This leaves con- 
siderable flexibility in interpretation, and many algorithms 
satisfying this criterion exist. However, we argue that a 
more restrictive criterion may be better used to describe 
an unbiased discontinuity preserving smoothing algorithm, 
which we refer to as edge awareness. 

Specifically, the criterion for edge awareness is: 
The influence that a pixel in the input image has on 
the output at another pixel location should be inversely 
related to the strength of the largest edge(s) separating 
those pixels in the input image. 

This criterion has statistical rationale behind it for noise 
removal. The process of smoothing can be equated to 
estimating the sample mean of the color reflectance dis- 
tribution at each pixel, and discontinuities in an image are 
reliable indicators of a change in the underlying reflectance 
distribution. Without any a priori knowledge of the distri- 
butions contained within an image, disjoint regions must be 
assumed to come from disjoint distributions, and samples 
from one distribution cannot possibly help in estimating the 
mean of another distribution. Therefore, pixels which are 
separated by a discontinuity, and hence not likely to come 
from the same distribution, should not influence each other. 

Arguably, there are many cases where a single reflectance 
distribution could be responsible for generating samples 
at disconnected regions in the image. For example, in 
the regions separated by veins in a leaf, or the out of 
focus background in a photograph separated by occluding 
foreground objects (see images A and B in Fig. |3j. In these 
cases the quality of the sample mean could be improved by 
using the samples from both distributions (eg, smoothing 
between disjoint regions). 





Fig. 3. Color distributions in real images: (a) the veins of a leaf separate 
regions of green having the same underlying distribution; (b) a foreground 
twig separates regions of a background out-of-focus, having a uniform 
distribution; (c) a crowd of people with many similar but differently colored 
body parts; (d) a collection of buttons with similar but different distributions 
of red. 



However, an unbiased smoothing algorithm should not 
assume that two distributions are the same just because 
their sample means are similar, because this is frequently 
not the case. As an example, consider a crowd of people: 
people tend to have similar skin colors which could appear 
close together in the image, but a smoothing algorithm 



should not change a person's skin color by blending them 
together (see examples C and D in Fig. pb. 

A smoothing algorithm that is not edge aware imposes 
the additional, often unwanted, assumption that distribu- 
tions having similar means are the same distribution. Doing 
so can result in the creation of novel colors that did not 
exist in the original image, which is counter productive if 
the purpose of smoothing is to recover a better estimate of 
the underlying process by removing noise (as it frequently 
is). 

3 Variable Conductance Diffusion 

Variable conductance diffusion Qv) g8) (19) (20) has 
long been used as a method of discontinuity preserving 
smoothing for images. It is inspired by the diffusion equa- 
tion 1 3.1 1, and uses a heuristic measure of conductivity 



c that is somehow based on the local image derivative 
magnitude. 



df(x,y) 
dt 



V-(C(x,y)Vf(x,y)) 



(3.1) 



For gray level images, the diffusion equation directly 
translates into an iterative discontinuity preserving smooth- 
ing algorithm using finite difference approximations to the 
derivatives, where f(x, y) is image intensity, and conduc- 
tivity is a decreasing function of local gradient magnitude 

mm, 



C(x,y)=g(\\Vf(x >y )\\) 



(3.2) 



In our implementation, we deviate slightly by using a 
vector-valued conductivity, 



df(x,y) 
dt 



V ■(C(x,y) T Vf(x,y)) 



d_ 
dx 



c x {x,y) 



df(x,y) 
dx 



d_ 
dy 



df(x,y) 
dy 



(3.3) 

and we make smoothing in each dimension a function of 
the partial derivative, so that C(x,y) — [c x (x, y)c y {x 1 y)] T ■ 
The components c x (x,y) (and correspondingly c y (x,y)) are 
defined by 



c x [x,y) = exp -A|| 



t d f(x,y) 
dx 



(3.4) 



For color images, the conductance in each channel can be 
made a function of the overall intensity gradient. Alterna- 
tively, Sapiro & Ringach [6| have proposed a method based 
on the local structure tensor. 

VCD is sometimes confused with anisotropic diffusion, 
which is almost identical but explicitly diffuses in the 
direction tangent to the local edge |7). In practice, all of 
these diffusion methods have similar performance. 

The diffusion process attempts to smooth in a way that 
respects intensity boundaries by making the amount of 
smoothing related to the magnitude of the derivatives. As 
a result, colors should not be blurred together unless there 
exists some path between them that does not significantly 
go against any local image gradients. Therefore, diffusion 
is edge aware in theory. 
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3.1 Limitations of Diffusion 

Finite differences are first order approximations of the true 
derivative, and iterative use causes these approximation 
errors to become magnified. One way to help mitigate 
this problem is to alternate between the causal and non- 
causal forms of the derivative on each iteration. However, 
noisy images may contain a large degree of high frequency 
content so that this is generally not enough to eliminate 
artifacts. 

In practice, a Gaussian blurring step is used before 
computing the finite difference derivative approximations 
|6|. This naturally makes the derivatives more stable by 
considering a slightly lower frequency. If noise is mostly 
high frequency, this also makes the derivatives more repre- 
sentative of the underlying signal. 

Unfortunately, because edges are inherently high fre- 
quency, lower frequency derivatives will not be able to 
accurately represent crisp edges, which will lead to slight 
blurring of edges with each iteration (as in Fig. |2j?). Al- 
though this blurring effect can be mitigated by using a very 
small sigma, it will always be present. Thus, the algorithm 
becomes progressively less aware of edges / discontinuities 
with each iteration, and will eventually break down and 
converge to smoothing everywhere. In this paper, we in- 
tentionally use a blurring sigma that is large enough to be 
noticeable. 

4 The Bilateral Filter 

The bilateral filter was developed by Tomasi & Manduchi 
for discontinuity preserving smoothing as a non-iterative 
alternative to anisotropic diffusion |8|. The primary claim 
for improvement over anisotropic diffusion is that the 
iterations in diffusion raise issues of stability, as small 
errors in the derivative estimation can become magnified 
over time, producing a less smooth image. Additionally, 
iterative algorithms can be inefficient for massively parallel 
architectures with limited communication. 

The bilateral filter overcomes these issues by considering 
a single fixed neighborhood of influence around each pixel, 
and estimating the influence of each neighbor by a com- 
bined spatial and range weighting. Specifically, each output 
pixel is replaced by a doubly-weighted average using the 
spatial distance and range (color) distance of the original 
neighbors. For each pixel coordinate s, the output color J s 
is given by 



Js 



(4.1) 
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where ft is the set of image pixels, I p is the RGB intensity 
at pixel p in the input image /, and A; is a normalization 
term. / and g could be any ID kernel functions, but in 
practice g is a true Gaussian with standard deviation an 
and / is a truncated Gaussian (for computational efficiency) 
with standard deviation as- 

Because it is not iterative and does not rely on derivatives, 
this makes it more numerically stable, but the use of large 
overlapping neighborhoods is also much more inefficient; 
unlike the Gaussian blur, this filter cannot be separated into 
a horizontal and vertical pass for increased efficiency. 



4.1 Limitations of the Bilateral Filter 

The bilateral filter is completely unaware of edges because 
neighboring pixels are averaged together based only on 
their color similarity, with no regard to any edges that 
might be separating them. As the spatial radius gets larger, 
the likelihood of two similarly colored pixels in the neigh- 
borhood being separated by some kind of discontinuity 
grows very quickly. This results in blurring across edges. 
In addition to producing novel colors, this can produce 
oddly surrealistic effects because it often blurs the interior 
of foreground objects with the background, as if it were a 
line drawing. 

Even if the foreground and background colors are notice- 
ably different, a high noise content may necessitate using a 
large color bandwidth, so that these colors are still blurred, 
resulting in noticeable color bleeding across boundaries. 
Also, opposing bright colors with similar intensity may be 
blurred together, resulting in muted grays and browns in- 
stead of the original vibrant colors (this is demonstrated by 
the noisy background in Fig. |2j. This effect will be referred 
to as color cancellation, and results in loss of saturation. 

Some images are more prone to these issues than others. 
If an image does not contain nearby opposing colors, has 
a sparse histogram, or has a very high signal to noise 
ratio, then the bilateral filter may producing very visually 
appealing results. However, this does not always mean 
that the results are accurate, because color bleeding across 
boundaries will still result in changing the average colors 
even if they still look good to a human. 

It should be noted that, in some contexts, blurring across 
boundaries is considered an advantage rather than a limita- 
tion. For example, if the objective is to generate a cartoon- 
like image rather than produce a realistic smoothing. 

5 The Mean Shift Filter 

Mean shift clustering [3] was originally developed as a 
method of clustering multi-dimensional data using an it- 
erative hill-climbing technique to assign each pattern to 
the nearest mode in a kernel density estimate of the global 
pattern distribution. A kernel density estimate is essentially 
an estimate of the probability mass function that generated 
a set of data points, and can be recovered by convolving a 
Gaussian with each data point. 

Conceptually, the mean shift vector is a step along the 
gradient of the kernel density estimate towards the nearest 
mode. A multi-dimensional mean value is initialized by the 
data point itself, and then a new multi-dimensional mean 
that is closer to the mode is computed by the average of 
all patterns weighted by their distance to the current multi- 
dimensional mean. 

Formally, the mean shift vector m(x) at point x, which is 
the vector towards the next point on the path to the mode 
1 3 1, is usually written as, 



m(x) 



E"=i x i9 1 



2 ) 



2 ) 



(5.1) 



for a set of n patterns X4, ...,x n and spatial bandwidth h. g 
is any ID kernel, but a Gaussian is typically used. 
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Mean shift clustering is particularly appealing because it 
is not only one of the few clustering algorithms that is not 
biased towards hyper-spherical or even hyper-ellipsoidal 
clusters, but is actually non-parametric (meaning that it 
is completely unbiased to the distribution) and still quite 
robust. 

Each pixel in an image can be thought of as a 5- 
dimensional point characterized by (x, y, r, g, b), and mean 
shift clustering can be used to find the modes of that 
distribution. When it is used in this context, it is known 
as mean shift filtering |3], although it is more effective to 
compute distance as the sum of Euclidean distances in 
the spatial and range components separately, as opposed 
to computing the 5-dimensional Euclidean distance. There- 
fore, two parameters are needed to specify the bandwidth 
(or equivalently standard deviation) in those two domains. 

Although mean shift clustering is effective for clustering 
many generic data sets, in the context of image processing 
mean-shift filtering tends to find too many modes for a 
suitable segmentation - but it does make a good step in 
the right direction. One of the most popular method of 
segmenting images, mean shift segmentation, first uses the 
mean shift filter, and then uses the union-find algorithm 
with region adjacency graphs to cluster the filtered image 
pixels (in the 'fusion' step) |3j. 

5.1 Limitations of Mean Shift 



Despite the very different form presented in 1 5.1 1 from 



| |4.1) , the mathematics of mean shift filtering are actually 
quite similar to that of bilateral filtering, except: where 
the bilateral filter used a bilateral weighting of the local 
neighborhood as a replacement to the pixel at the origin, 
the mean shift filter uses a bilateral weighting of the local 
neighborhood to compute the next step towards the local 
kernel density estimate of the multivariate image intensity 
distribution. 

In fact, if Si is used to denote the ith shifted position of 
the mean starting at pixel s, then mean shift filtering can 
be written in the same form as the bilateral filter, because 
the bilateral filter is essentially identical to 1 iteration of the 
mean shift filter: 



(5.2) 
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where sf is the spatial component of the mean, and sf- is 
the range component. This similarity seems to have been 
previously overlooked in the literature. 

Because the mean shift algorithm shifts the location of the 
mean on each iteration until a local mode of the distribution 
is found, the range of influence is not restricted to pixels in 
the initial neighborhood of a pixel - theoretically, any pixel 
in the input image could influence the color of any pixel in 
the filtered output image. 

In practice, the spatial bandwidth used to compute av- 
erages in mean-shift filtering can be much smaller than 
for bilateral filtering because the window moves iteratively. 
Because images contain many modes, the number of mean 
shift iterations is usually not too large, and as a result mean- 
shift filtering can actually be significantly more efficient 



than bilateral filtering for comparable ranges of smoothing 
influence. In addition, because mean-shift filtering does hill- 
climbing on the underlying image distribution, the mean 
shift path has a tendency to not cross large boundaries in 
the image, which allows it to perform blurring over larger 
effective ranges than a bilateral filter without significant 
color bleeding across boundaries. 

By replacing each pixel with the color of its nearest asso- 
ciated mode, the mean-shift filter tends to reduce gradients 
in the original image. However, it also can create gradients 
that did not previously exist between two regions if their 
color difference is small relative to h r . If the filter is being 
used as a precursor to clustering (such as is the case in 
the popular mean shift segmentation algorithm [3]), then 
reducing gradients can be an important property because 
pixels on a gradient should logically be clustered together. 

Because the mean-shift path essentially follows the gra- 
dient towards the mode, it tends to not cross boundaries 
in the original image, but there is no guarantee that it 
won't. As the spatial bandwidth becomes larger, it becomes 
increasingly likely that it will, for two reasons. The first 
reason is that because the mean shift uses a bilateral 
weighting, it may give high weighting to pixels that are 
separated by an intensity border. However, this is less likely 
to be a significant problem because the spatial bandwidth 
is generally much smaller when using mean shift than the 
bilateral filter. 

The second, more prominent reason, is that there is noth- 
ing preventing the spatial mean of similarly colored nearby 
pixels from shifting to a location which has a drastically 
different pixel color. This happens a lot around non-convex 
colored image features, such as the black rectangles in Fig. 
|5Ji. The most obvious conceptual example is a donut shaped 
feature: suppose the mean shift vector is attempting to 
converge to the mode of this distribution. If the spatial 
bandwidth is large relative to the donut, then the curvature 
of the donut will cause the spatial mean to fall within the 
center of the donut. 

Because mean shift uses a multi-variate mean, this won't 
immediately change the color of the range component of the 
mean. However, after the spatial component converges to 
the center of the donut, the color component will slowly 
converge to the color of the interior of the donut because 
of the high weighting from the spatial component of the 
bilateral weighting. The end result of this effect is that a 
pixel which started out on the donut is assigned to the 
mode of a completely different distribution: that of the 
background behind the donut, and this is a violation of 
edge awareness. 

These problems essentially limit the range of bandwidths 
that can be used with good results in mean shift, restricting 
it to relatively small scale (high frequency) kernel density 
estimates. 

6 Edge Aware Modifications for 
Diffusion 

Our edge aware modification to diffusion is extremely 
simple: rather than using a Gaussian blur to condition the 
derivatives, we propose to use a bilateral filter (discussed 
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in more detail in Section Although we have already 
illustrated some exceptions in which the bilateral filter 
does not properly account for edges in Section 4.1 these 
limitations are only present when using a large spatial 
support. The blurring process here is only used to fix pixel 
discretization errors, so a small fixed size bilateral filter using 
only the adjacent neighborhood is sufficient. In all examples 
in this paper, we use a 5 x 5 neighborhood with a spatial 
standard deviation of 0.5. At this small scale, the limitations 



of the bilateral filter (discussed in Section 4.1 1 are irrelevant. 



6.1 Performance Analysis of Edge Aware Diffusion 

The original diffusion equation is extremely efficient for 
sequential machines because, in each iteration, color is 
propagated locally based on derivatives. This eliminates the 
need to consider large neighborhoods. Each iteration can 
be done in low 0(n) time. Thus, for a range of influence 
of i pixels, the time complexity of the original algorithm is 
0(ni). 

The Gaussian blur is a linearly separable operator which 
can be computed in two passes using one-dimensional 
kernels for improved performance, whereas the bilateral 
filter used in the proposed edge aware modified version 
cannot be. However, because the neighborhood considered 
by the bilateral filter is a small fixed size, this can be treated 
as a constant. Therefore, switching to a bilateral filter does 
not change the time complexity of the algorithm, and only 
slightly increases the overall constant factor. 

6.2 Experimental Results of Edge Aware Diffusion 

The original and edge aware modified diffusion filters 
are tested on a selection of images in Fig. [4] We have 
intentionally used a blurring radius that is large enough 
to exaggerate the smoothing out of edges in the origi- 
nal diffusion algorithm. This makes the improvements of 
our edge aware diffusion algorithm, which preserves hard 
edges even with large blurring radius, more noticeable. It 
should be noted that a better looking result could have been 
obtained using the regular diffusion algorithm by using a 
smaller blurring radius. 

The algorithms are also compared on the Challenge image 
(Fig. [5] and Fig. |6j while varying the only two free parame- 
ters: A, the diffusion coefficient, and i, the number of filter 
iterations. 

The edge aware version performs better than the original 
across the gamut by preserving edges while still smoothing 
other areas. As expected, reducing A reduces the tolerance 
for edges and increases overall smoothing in both algo- 
rithms (nonlinear ly). 

However, it is interesting to note that, in the edge aware 
version, extremely low values of A cause edge duplication. 
This behavior is not important because such low values of A 
do not make sense for discontinuity preserving smoothing 
anyway. 

7 Edge Aware Modifications for 
the Bilateral Filter 

Consider the 4-connected image graph where pixels are 
nodes and edges are weighted by the Euclidean distance 



between pixel colors. Then there exists a shortest path on 
the graph between any two pixels, and the length of this 
path is a representation of the combined spatial and range 
distance between the pixels which takes into account any 
discontinuities between them. 

A traditional bilateral filter uses a spatial distance weight 
multiplied by a color dissimilarity weight. We propose 
to replace this bilateral weighting of range and distance 
treated separately by a single weighting of the shortest path 
distance on the image graph. This gracefully combines the 
two types of distances while properly accounting for edge 
awareness. 

If the shortest path distance between pixels p and s is 
denoted by d(p, s), then the modified bilateral filter can be 
written as 



J s = ^^2g{d(p,s))I p 



(7.1) 
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Writing the filter in this way suggests some other in- 
teresting filters, as well. For example, consider using the 
maximum edge distance instead of the sum of edge dis- 
tances as the input to the Gaussian for weighting. Such 
a filter would eliminate gradients in the source image, 
while preserving solid colored regions, which might have 
interesting applications for image segmentation. 

7.1 Performance Analysis of Edge Aware 
Bilateral Filter 

The time complexity of processing an image with the 
regular bilateral filter is 0{nd), where n is the number of 
pixels and d is the number of pixels defined in the (square) 
local neighborhood. Note that d = 4i 2 in order to have a 
range of influence of i, so it is significantly more expensive 
than diffusion, which has complexity 0(ni) (for single pro- 
cessing architectures). In a massively parallel architecture, 
the bilateral filter could be done in O(d) = 0(i 2 ) time and 
0(1) communication, whereas diffusion would take 0(i) 
time and 0(i) communication. 

Determining the path distance from a pixel p to each 
neighbor q e N p is an instance of the single-source shortest- 
paths problem |5j. It can be computed efficiently with 
Dijkstra's algorithm [4] in 0(m log n) time [5|, for a graph 
with n nodes and rn edges. Using the 4-connected image 
graph, each node contributes < 2 edges (eg, the edge above 
and to the left), so the total number of edges will be < 2d 
(note that \N. p \ « d). Thus, the time complexity for solving 
the single-source shortest-paths problem on a 4-connected 
image graph is 0(d log d). Computing the bilateral weighted 
average would take 0(d) time already, so this is only an 
additional factor of 0(\ogd). 

Because of the special circumstances of this problem, 
some optimizations can be made such that the average 
case performance is significantly better than the worst case 
performance. First, it should be noted that the spatial range 
(which determines the size of d) is much less important, 
because there is no mathematical reason to induce an 
arbitrary cutoff on the range of influence. Rather, it is more 
intuitive to specify the range in terms of total path length. 
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Fig. 4. Comparison between nonlinear anisotropic diffusion filtered results on input image (left column) using standard algorithm (middle column) and edge 
aware modified algorithm (right column). A = 0.002 and iterations=5 for all tests. For smoothing prior to taking derivatives, as = 0.5, and in the case of 
bilateral filter a R = 55. 



A = 0.0005, i = 1 A = 0.0005, i = 2 A = 0.0005,? = 3 A = 0.0005,i = 4 A = 0.0005,i = 5 
A = 0.0105, i = 1 A = 0.0105, i = 2 A = 0.0105, i = 3 A = 0.0105, t = 4 A = 0.0105, i = 5 





A = 0.0500, i = 1 A = 0.0500, i = 2 A = 0.0500,i = 3 A = 0.0500,i = 4 A = 0.0500,i = 5 
Fig. 5. Nonlinear anisotropic diffusion results on Challenge image using standard algorithm. 
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If path length is used to specify the range of influence, 
then it is only necessary to compute the distance to pixels 
q € N p if d(p, q) < tr, where tr is a cutoff based on 
the range standard deviation or. In these experiments, 
tji = 3<7ft. Because paths are found in increasing order 
using Dijkstra's algorithm |4], this means that computation 
can terminate as soon as a path having length > tr is found. 

In real images which do not usually contain large regions 
where each pixel has exactly the same intensity, there will 
only be a constant number of pixels that can be reached 
from p without exceeding the maximum path cost tr 
regardless of spatial bandwidth. Therefore, the average case 
time complexity is lim^oo 0(d\ogd) — 0(1). However, 
to protect the efficiency from suffering from a very high 
constant in homogeneous regions, a spatial cutoff should 
still be used in addition to tr. 



7.2 Experimental Results of Edge Aware Bilateral Filter 

The original bilateral filter and the edge aware version are 
compared on a few test images in Fig. [7] A large spatial 
bandwidth is used specifically to highlight the limitations 
of the bilateral filter. 

All of the images filtered with the standard bilateral filter 
exhibit a loss of saturation. In the Froud and Egg tests, 
this creates a somewhat surrealistic looking effect, because 
foreground colors are blended with background colors. In 
the Paris test, this is particularly noticeable in the flowers 
which acquire an orange hue. In the Eggs image, the color 
of the straw noticeably bleeds into the eggs. The bilateral 
filter does a fairly good job on the Nest image (due to the 
extremely simple histogram), but there is still some visible 
cloudy gray effects in the sky to the right and left of the 
nest (as well as desaturation of the sky). 

In contrast, using the edge aware improved version based 
on shortest path distance, colors maintain their bright sat- 
uration in all of the tests in Fig. [7| Note that, in general, 
even better results could probably have been obtained by 
using a smaller bandwidth and more iterations (as is done 
in Fig. |9}. 

The algorithms are also compared on the Challenge image. 
In Fig. |§J the original bilateral filter is tested across a range 
of free parameter values. The free parameters are the usual 
<tr (standard deviation in the range dimension) and a 
(standard deviation in the spatial dimension). The effects of 
color cancellation are very prominent across all parameter 
ranges, and the noisy colored region is never satisfactory 
smoothed out. 

The results using a range of parameters with the edge 
aware version is shown in Fig. [9] Note that these parameters 
are completely different from the free parameters of the 
original bilateral filter - or is the combined spatial-range 
parameter which is used on shortest path length. The 
spatial parameter is essentially irrelevant - we simply use 
a large value so that all path-lengths go to zero before 
reaching the boundary. However, we do introduce a new 
parameter i, number of iterations, which can be used to 
tweak the 'spatial' influence somewhat. 



8 Edge Aware Modifications 
for Mean Shift 

Because the mean shift filter also uses a bilateral weighted 
average, the computation of the mean shift vector can 
be done in an edge aware way by simply adopting the 
same modification that was proposed for the bilateral filter 
| |5.2| . However, we found that this additional computation 
does not improve the filtering result because the spatial 
bandwidth is typically small enough that multiple similarly 
colored distributions are not likely to occur within the 
support. 

The more critical violation of edge awareness that that 
we wanted to fix occurs when a concave feature causes 
the mean to shift off of the feature. In order to fix this 
problem, we propose to compute the spatial component of 
the mean shift as normal, but to shift to a location that is a 
compromise between the mean shift vector and the image 
feature. There are many conceivable ways to accomplish 
this, and the method we use is by no means optimal, but 
it works well in all of our tests. 

Specifically, we begin by defining the set S to contain all 
pixels in the image with similar color to the current mean, 



S >efl: l]Ip h2 S ^ <r 



(8.1) 



Then, if the next arithmetic spatial mean is /i, we shift to 
the spatial point p defined by 



argmin(p — /i) 2 

pes 



(8.2) 



We found that using r = 0.5 works well in general, and 
we use this value for all of our experiments. 

8.1 Performance Analysis of Edge Aware Mean Shift 

Finding the mode for each pixel requires 0(di) time, where 
d is the number of pixels in the local neighborhood (de- 
termined by h s ) and i is the number of iterations. Because 
the mean shift procedure is guaranteed to converge (31, i 
is finite (for real images, the average number of iterations 
is almost always somewhere in the range of [1.1, 1.8] using 
the optimization mentioned below). 

One of the optimizations proposed for the mean shift 
filter by Christoudias et. al |2j is to store the found mode for 
each pixel that falls on a mean shift path. Then, if another 
mean shift path falls upon a pixel that was previously on 
another path, the new mean shift path uses the previously 
found mode. 

This makes the algorithm order-dependent and no longer 
guarantees that each pixel will be assigned to the correct 
mode of the kernel density estimate, but they found it to 
produce very good results about 5-6 times faster (although 
algorithmic time complexity is unchanged). This optimiza- 
tion is used in our implementation of mean shift as well. 

8.2 Experimental Results of Edge Aware Mean Shift 
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A comparison of the filtered results is shown in Fig 
Figs. 11 and 12 show results on the Challenge image using 
the original and modified algorithms, respectively. 
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Froud 



Bilateral 



Edge Aware Bilateral 




Bilateral 



Edge Aware Bilateral 



Fig. 7. Comparison between bilateral filtered results on input image (left column) using standard algorithm (middle column) and edge aware modified 
algorithm (right column), as = 10 and a R = 55 for all tests (o\r is used as the combined spatial-range parameter in the edge aware version). 
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Paris 



Mean Shift 



Edge Aware Mean Shift 




Nest 



Mean Shift 



Edge Aware Mean Shift 




Mean Shift 



Edge Aware Mean Shift 



Fig. 10. Comparison between mean shift filtered results on input image (left column) using standard algorithm (middle column) and edge aware modified 
algorithm (right column). h s = 11 and h r = 55 in all tests. 




h a = 3,h r = 30 h s = 5,h r = 30 h a = 7 ,h r =30 h s = 9,h r = 30 




h s = 3,h r =90 h s = 5,hr = 90 h s = 7,h r =90 h s = 9,h r = 90 
Fig. 1 1 . Standard mean shift filtered result on Challenge test image. 




h a = 3,h r = 60 h s = 5,h r = 60 h s = 7,h r = 60 h s = 9,h r = 60 




h s = 3,h r =90 h s = 5,hr = 90 h s = 7,h r =90 h s = 9,/i r = 90 



Fig. 12. Proposed edge aware mean shift filtered result on Challenge test image. 
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The most notable difference between the original and 
edge aware modified filter results shown in Fig. 10 is that 
the edge aware version significantly cuts down on the 
ragged edges. A better looking filtered image could have 
been produced by using a smaller spatial bandwidth, but 
that changes the scale that is used to find modes in the 
kernel density estimate. The purpose of this test was to 
allow larger spatial bandwidths to be used to find large 
scale modes in homogeneous regions while still preserving 
fine detail in other regions. The modified filter accomplishes 
this. 

In the Challenge test, the original algorithm creates many 
unsightly black artifacts (Fig. [TTJ, and also causes some of 
the white pixels to get assigned into the red block at larger 
spatial bandwidths. The edge aware version preserves the 
strong black boundary lines while still clustering the under- 
lying colors according to the kernel density estimate (Fig. 

As the scale grows larger, the edge aware version be- 
comes progressively less "edge aware", and eventually per- 
forms blurring across boundaries. A method for preventing 
this effect was mentioned in Section|8j but it was specifically 
not used because it disturbs the kernel density estimate too 
much. 

9 Conclusions 

Several successful approaches to the discontinuity preserv- 
ing smoothing problem have been reviewed, illustrating 
an interesting conceptual dichotomy between viewing it 
as a global optimization problem vs. a class of algorithms 
related to diffusion. The global optimization approaches 
all minimize the same fundamental form using very dif- 
ferent methods. Variable conductance diffusion is a direct 
application of physical diffusion, and anisotropic diffusion 
is a simple variant of variable conductance diffusion. The 
bilateral filter is a non-iterative approximation to diffusion, 
and we showed that the mean shift filter is an iterative 
generalization of the bilateral filter in higher dimensions. 

We showed that edge awareness is a desirable property 
for unbiased discontinuity preserving smoothing for noise 
reduction, and demonstrated examples where each of the 
above algorithms violated the principle of edge awareness 
resulting in degraded image quality: the diffusion algorithm 
originally had problems maintaining edges after successive 
iterations, the bilateral filter had issues with color bleeding 
and loss of saturation (color cancellation), and the mean 
shift filter created significant artifacts for non-convex col- 
ored regions when using large spatial bandwidths. 

Finally, we proposed individual modifications to each 
algorithm to account for edge awareness and showed how 
these modifications could be implemented with little or no 
change in algorithm complexity. After making the changes, 
visual improvements were easily visible in the smoothed 
results. 
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